Continuous Articulatory-to-Acoustic Mapping using Phone-based Trajectory HMM for a Silent Speech Interface
نویسندگان
چکیده
The article presents an HMM-based mapping approach for converting ultrasound and video images of the vocal tract into an audible speech signal, for a silent speech interface application. The proposed technique is based on the joint modeling of articulatory and spectral features, for each phonetic class, using Hidden Markov Models (HMM) and multivariate Gaussian distributions with full covariance matrices. The articulatory-toacoustic mapping is achieved in 2 steps: 1) finding the most likely HMM state sequence from the articulatory observations; 2) inferring the spectral trajectories from both the decoded state sequence and the articulatory observations. The proposed technique is compared to our previous approach, in which only the decoded state sequence was used for the inference of the spectral trajectories, independently from the articulatory observations. Both objective and perceptual evaluations show that this new approach leads to a better estimation of the spectral trajectories.
منابع مشابه
Statistical Mapping Between Articulatory and Acoustic Data for an Ultrasound-Based Silent Speech Interface
This paper presents recent developments on our “silent speech interface” that converts tongue and lip motions, captured by ultrasound and video imaging, into audible speech. In our previous studies, the mapping between the observed articulatory movements and the resulting speech sound was achieved using a unit selection approach. We investigate here the use of statistical mapping techniques, ba...
متن کاملCross-speaker Acoustic-to-Articulatory Inversion using Phone-based Trajectory HMM for Pronunciation Training
The article presents a statistical mapping approach for crossspeaker acoustic-to-articulatory inversion. The goal is to estimate the most likely articulatory trajectories for a reference speaker from the speech audio signal of another speaker. This approach is developed in the framework of our system of visual articulatory feedback developed for computer-assisted pronunciation training applicat...
متن کاملContinuous-speech phone recognition from ultrasound and optical images of the tongue and lips
The article describes a video-only speech recognition system for a “silent speech interface” application, using ultrasound and optical images of the voice organ. A one-hour audiovisual speech corpus was phonetically labeled using an automatic speech alignment procedure and robust visual feature extraction techniques. HMM-based stochastic models were estimated separately on the visual and acoust...
متن کاملAcoustic-to-articulatory inverse mapping using an HMM-based speech production model
We present a method that determines articulatory movements from speech acoustics using an HMM (Hidden Markov Model)-based speech production model. The model statistically generates speech acoustics and articulatory movements from a given phonemic string. It consists of HMMs of articulatory movements for each phoneme and an articulatory-to-acoustic mapping for each HMM state. For a given speech ...
متن کاملAcoustic-to-articulatory inversion using a speaker-normalized HMM-based speech production model
Acoustic-to-articulatory inverse mapping is a difficult problem because of its non-linear and oneto-many characteristics. We have previously developed a speech inversion method using a hidden Markov model (HMM)-based speech production model which takes into account the phonemespecific dynamic constraints of articulatory parameters. We found that the constraint significantly decreases the estima...
متن کامل